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O ■ Abstract 

, Please note: this document is very much work in progress till 10/31 by my estimate. 

■ If something seems off, it probably is. Please email N Santhanam above — you can get 
notes for clarification. Compression refers to eneoding data using bits, so that the representation 
uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered 
exactly from its representation) or lossy where the data is compressed more than the lossless case, 
but can still be recovered to within prespecified distortion metric. In this paper, we prove the 

t/3 . optimality of Codelet Parsing, a quasi-linear time algorithm for lossy compression of sequences 

of bits that are independently and identically distributed {iid) and Hamming distortion. Codelet 
Parsing extends the lossless Lempel Ziv algorithm to the lossy case — a task that has been a focus 
(N- of the source coding literature for better part of two decades now. 

Given iid sequences x, the expected length of the shortest lossy representation such that x can 
] be reconstructed to within distortion D is given by the rate distortion function, r{D). We prove the 

■ optimality of the Codelet Parsing algorithm for lossy compression of memoryless bit sequences. It 
, splits the input sequence naturally into phrases, representing each phrase by a codelet, a potentially 

■ distorted phrase of the same length. The codelets in the lossy representation of a length-n string 
X have length roughly (logn)/r(D), and like the lossless Lempel Ziv algorithm, Codelet Parsing 
constructs codebooks logarithmic in the sequence length. 

Introduction 



X: 
^ . 

i Kac's lemma |lj for stationary ergodic sources formalizes the connection of the recurrence time of events 

with their probabilities. This connection implies an elegant way to recursively compress sequences 
from stationary ergodic sources to their entropy, formalized by the Lempel Ziv algorithm for lossless 
compression. 

The theoretical and commercial importance of the Lempel Ziv algorithm and its variants have 
not only been established for compression problems, but also for classification [2] and denoising [3] 
algorithms. Li addition to their theoretical guarantees, these algorithms have attractive computational 
and storage properties, are often entirely data driven, and do not rest on sensitive choices of parameter 
values. It is thus not surprising that Lempel Ziv based algorithms form the core of compression 
algorithm software, including WINZIP, gzip, and the UNIX compress algorithms. Additionally, 
Lempel Ziv compression has had profound influence in the study of complexity, see for example, [HIS]. 
For many researchers, this angle perhaps outweighs even the commercial significance of Lempel Ziv 
compressors. 
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Lossy compression 

Surprisingly, no algorithms as attractive and simple as the Lempel Ziv algorithm are known for lossy 
compression. In fact, in the recent past, some researchers were pessimistic about the problem in 
general, see ^Gj for details. For example, [3 p. 2709] noted that "All universal lossy coding schemes 
found to date lack the relative simplicity that imbues Lempel-Ziv coders and arithmetic coders with 
economic viability". 

Of course, a lot of research continues on lossy source compression algorithms, mainly with an eye 
on the potential theoretical and practical benefits of having such algorithms. 

Prior work 

We present a representative, but necessarily brief and non-exhaustive review of various known lossy 
coding schemes, focussing on algorithmic results. For references to earlier results on existence of uni- 
versal lossy codes involving exponential-time constructions, see, Kieffer [8]. We confine our discussion 
here to finite discrete source and reproduction alphabets; for an extensive survey of results for real- 
valued sources, see [9]. Among these, we are particularly interested in papers that have focussed on 
lossy extensions of the Lempel-Ziv algorithm. 

Most algorithms have naturally used approximate string matching [10^ [TT] instead of exact string 
matching as in the Lempel-Ziv algorithms. The unresolved question has always been which of the "ap- 
proximately matching" representations to choose. Cheung and Wei |12] extended a move-to-front algo- 
rithm to lossy source coding. The algorithm is sup-optimal p!3]. Later, Zhang and Wei [H] proposed an 
universal, on-line lossy coding algorithm for the fixed-rate case. Morita and Kobayashi ^15j extended 
the LZW algorithm, but their algorithm is known to be sub-optimal for memoryless sources |13] . 
Constantinescu and Storer [16\ [T7] combined ideas from lossless Lempel-Ziv algorithms and vector 
quantization to design first practical implementations of lossy image compression based on approxi- 
mate string matching. The problem of "selecting amongst multiple matches" mentioned above was 
termed the "Match Heuristic" in their work; see, also, Storer [181 p. 111]. Steinberg and Gutman [19] 
and Luczak and Szpankowski [20] considered the fixed-database version of the Lempel-Ziv algorithm, 
and provided sub-optimal performance guarantees. However, Yang and Kieffer [T3] established that 
all previous fixed-database extensions of the Lempel-Ziv algorithm are suboptimal. 

Kontoyiannis [21] presented a scheme where multiple databases are used at the encoder, which 
must also be known to the decoder. However, when the reproduction alphabet is large, the number 
of training databases is unreasonably large. Atallah et al. [22] considered a cubic-time, adaptive 
algorithm (PMIC) in the spirit of LZ77. Their algorithm is not sequential in the sense of [23], since 
its encoding delay grows faster than o(n). Alzina et al. ^24j combined ideas from [22] and [16\ I17| to 
propose a 2D-PMIC algorithm that is more suited for two dimensional images. 

Continuing the quest for Lempel-Ziv-type lossy algorithms, Zamir and Rose [25] further studied the 
algorithm in [15j. From the multiple codewords that may match a source word, they suggest choosing 
one "at random". From a theoretical perspective, by assuming uniqueness, Zamir and Rose [26] 
proposed a natural type selection scheme for finding the type of the optimal reproduction distribution. 
In later work, Kochman and Zamir [27] pointed out that the theoretical procedure in [26] is in itself 
not practical and demonstrated an application of natural-type selection to on-line codebook selection 
from a parametric class. Along a different line, Yang and Kieffer |28| have proposed exponential-time 
Lempel-Ziv-type block codes that are universal (for stationary, ergodic sources and for individual 
sequences). In a related work, Yang and Zhang [29] presented fixed-slope universal lossy coding 
schemes that search for the reproduction sequence through a trellis in a fashion reminiscent of the 
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Viterbi algorithm. 

The lossy coding problem has been approached using methods fundamentally different from the 
Lempel Ziv like approaches as well. Matsunaga and Yamamoto [3^ considered LDPC codes for 
lossy data compression. In this line of work, Wainwright and Maneva [31] looked at message passing 
and Low Density Generator matrices (LDGM), while Martinian and Wainwright [32] looked into the 
construction of LDGMs and compound code constructions, showing the existence of compound LDGM- 
LDPC constructions that achieve the rate-distortion bound. Futher bounds on the performance of 
these constructions have been considered in [33]. In another line of attack, Jalali, Montanari and 
Weissman approach the problem using dynamic programming approaches [34] . 

Challenges 

In this paper, we consider lossy encoding of memoryless data. What constitutes progress at a concep- 
tual level? The algorithm we consider, Codelet Parsing, reduces to the Lempel Ziv algorithm (LZ78 
version) for lossless encoding, and we believe that Codelet Parsing may be optimal for stationary 
ergodic sources as well. 

One way to think of lossy encoding is as follows. We construct a codebook C, a set of sequences 
substantially smaller than the set of all possible sequences. Given any sequence x, we fix an element of 
C as its representation. Thus, for any sequence x, we only have to describe which element in C it maps 
to (rather than all possible sequences). If C has been chosen well, every sequence has some sequence 
of C that is fairly close to it. Thus the crux of the lossy compression problem is (i) to construct C, and 
(ii) to search for a representation. The minimum size of C is characterized through the rate distortion 
function r{D). 

We sketch a rough picture of the problem of lossy compression now. While not necessary for the 
results of our paper, most of the statements below can be made formal. If a length-n sequence X is 
generated iid Bernoulli the probability X matches a length-n sequence y to within distortion D is 
highest if the type of y is (jp — D) /(\ — 2D). The probability of match is then 2~"'"^^)/poly(n). Thus if 
we are to encode length n sequences, \C\ > poly(n)2"''^'^) in order to satisfy the distortion budget D. 
In fact, a randomly chosen C from sequences with type {p — D)/{1 — 2D) will cover almost all input 
sequences with size \C\ = poly(n)2"'''(-^). Thus random coding uses > nr{D) + 0(logn) bits to represent 
a string. This approach is clearly not practical (both construction and search take exponential time) 
and we look for more efficient ways to achieve the goal by using more structured codebooks. 

Lempel Ziv approaches circumvent the problem of exponential encoding and search time with a 
recursive construction. Rather than construct codebooks for length n sequences, one constructs a 
set T> of sequences of length ■ Often, codebooks over lengths smaller than the sequence length 
are refered to as dictionaries in Lempel-Ziv literature to avoid confusion, and we adopt the same 
convention. The algorithm splits the length-n sequence X into phrases of length , representing 

each phrase by one of the elements of P. The strength of this approach is that the construction of 
V happens naturally using just the data to be encoded, and is known to capture the probability laws 
governing the data as long as the data is stationary ergodic (not just memoryless). 

Furthermore, a simple argument about recurrence time of events shows that it is not possible to 
estimate probabilities of all strings of length r2(logn) using n samples — a fact that will come into play 
if the algorithms are to be extended for all stationary ergodic sources. Thus, the dictionaries cannot 
be over sequences longer than 0(\ogn) if we have the goal of extending our algorithm to all stationary 
ergodic sources. 

What should we expect from all this? We should expect an approach using the Lempel Ziv theme 
to have redundancy (the excess bits over the rate distortion nr(D) term) commensurate with random 
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encoding of sequences of length (log n)/r(D). Comparing with the numbers given above for random 
encoding of length n sequences, we conclude that such approaches use nr (£>) + 0( "'g°g" ) for length n 
sequences. However, the complexity of search through D to represent any phrase of length (log n)/r{D) 
is linear in n, leading to an overall complexity of 0{r? j (log n)) in order to encode a sequence of length 
n. 

Note that actually adapting the Lempel Ziv theme is non-trivial. In particular, how does one 
guarantee that the dictionary P constructed does match the performance of a randomly chosen and 
good codebook of length (log n)/r(D)? This is analogous to the channel coding problem for communi- 
cation, where a randomly chosen code is good with high probability — yet constructing practical codes 
that are optimal took almost 60 years of intense research. Indeed, the connections run deeper — lossy 
compression is a covering problem, while channel coding is a 'packing problem. 

Here we show that Codelet Parsing built on the Lempel Ziv theme has a redundancy of 0( ^°^'°^" ) 
as expected. However, Codelet Parsing constructs the dictionary I? in a more structured manner than 
brute force random construction, and finding a match requires only poly(log n) (not linear) complexity 
on an average. Thus Codelet Parsing is a quasi linear algorithm. At the level of encoding length-n 
sequences, this is seemingly only an improvement from quadratic to linear complexity (notwithstanding 
the fact that it is not even clear how to achieve quadratic complexity), but such an improvement also 
indicates a new way to build the dictionary. 

Contributions 

This paper builds on the Lempel Ziv approach along the lines of [351 [MIE]. Iii particular, we analyze 
an idealization of a Lempel Ziv like algorithm called Codelet Parsing, proposed by the authors in |37j . 
In a preliminary paper [6j, we showed convergence of Codelet Parsing's coding rate (the number of 
bits used to describe the lossy representation of a string, normalized by the length of the string) to 
the rate distortion function, when the input string is iid and the distortion is fixed to be Hamming 
distortion. 

In this paper, we obtain a covering lemma that allows us to characterize the rate of convergence 
of the coding rate as ^ '"^^j^^" ^ (exponentially better than the loose estimate in p]). It is important 
to highlight how this result substantially strengthens 0. 

In particular, we note a few important points. The distorted phrases are of length roughly 
(log n)/r(L') and are obtained by searching through a codebook (maintained as a complete binary 
tree as in the LZ78 setup). 

1. The sequences in this codebook are not obtained by exhaustive search. Instead, they are recur- 
sively obtained by calling on codebook constructions over shorter lengths of length O (log log n). 
In addition, searching for an approximate match does not require an exhaustive search over 
sequences of length (logn)/r(D). 

2. The shorter codebook constructions work in synergy in a manner of speaking since convergence 
to r(-D) is O ^ " ^ . This rate is almost what we should expect even for exhaustive codebook 
constructions of length ©(logn). 

A consequence of the first point is that we obtain an algorithm that is quasi- /mear (linear with log fac- 
tors) complexity. This is a savings from the potentially super-quadratic complexity if we exhaustively 
construct or search through codebooks of length log n/r(Z)), 
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To put the second point in perspective, the convergence rate of our algorithm is exponentially 
faster than what could have been obtained by partitioning x into phrases of length ©(log log n), and 
representing each phrase in a lossy manner using a codebook of length O (log log n). 



1 Preliminaries and combinatorial interpretations 
1.1 Rate-Distortion and Lower-Mutual-Information 

Let X" = Xi, X2, ■ ■ ., where Xi G {0,1} for all i, be a realization of an iid process P, with the 
marginal distribution on Xi being P{Xi = 1) = p. We represent a string of length n, X" using a 
potentially distorted S {0,1}". Let d{X"',Y"') denote the Hamming distortion between X" and 
Y"'. We adhere to an expected distortion constraint, namely Ed{X" ,Y^) < D. It is customary to call 

the codeword used for the lossy representation of X"'. Note that Y^ is not necessarily iid and is 
determined by the algorithm used to pick codewords. 

The rate distortion function captures, asymptotically, the minimum number of bits that have to 
be used to describe strings of length n to within distortion D. Interestingly, it has a single letter 
characterization, meaning that it can be specified by looking at the joint distribution over a pair of 
bits (y, X) such that P{X = 1) = p. The conditional distributions on X given Y correspond to a 
channel, while Y is interpreted as the channel input and X the channel output. 

Let W be the set of all possible channels. The rate- distortion function is 

r(D) = R(P, D) = min I(X, Y) 

Kd{X,Y)<D 

where I{X, Y) denotes the mutual information and Y ^ q' means P(Y = 1) = q'. 

The lossy coding problem is essentially a covering problem. Suppose we consider length-n sequences 
X" generated by an iid measure P, satisfying P{Xi = 1) = p (as befor). Say we want the the 
probability of length n sequences of type p that are within distortion D from a sequence y with type 
q. This probability again has a single letter characterization in terms of a pair of binary variables 
(Y, X), where y ~ g and X ^ p. In particular, we define 

Uq,p,D)'':^' min I{X,Y), 

uieW-.X^pY^q 
d(q,u))<D 

where we are minimizing the mutual information I{X, Y) over all joint distributions consistent with 
the marginals being X ^ p and Y ^ q, and E,d{X,Y) < D. The probability we want is then 
2-nlmiq,p,D)+0{logn) ^ J^(^q^p^ D) is a convex function of q for a fixed p, with a minimum at the optimal 
reproduction type q*. 

Intuitively speaking, codewords with the optimal reproduction type have the largest D— balls 
among sequences of type P, hence, yield the best covering. For a precise formulation of the above 
concepts, see [HI [26]. However, to just obtain the estimates given above, a simple combinatorial 
calculation followed by picking the dominant term suffices. 



1.2 Ballot box problem 

We have an expected distortion constraint between a sequence X^ generated by P and its codeword 
y". As we will will see, we obtain y" by first breaking X" into disjoint phrases X" = X^^\ . . . jX^'') 
(where r = 0{n/ logn)), and representing each phrase X^*) by a codelet y^*) of the same length, such 
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that (i(x(*\y(*^) < D. Such an approach however leads to lack of sufficient structure in the codebooks 
generated, leading to quadratic complexity for the algorithm. 

To better implement search and representation among codelets, we impose a more restrictive 
constraint in picking codelets. We will require not only that (i(x^*\y^*)) < D in the example above, 
but that every prefix of x^*) be within distortion D of the corresponding prefix of y*^*-*. Namely, for 
any /, if x' and y' are /-length prefixes of x^*^ and y*^*-* respectively, we require that d(x',y') < D as 
well. We then write x^*) ~ y and say that x^*) matches y . 

The important thing is that the probability that a codelet y finds a match is essentially the 
probability of all sequences with distortion D from y. In fact 

(maybe state stronger too?) 

Lemma 1. Let length n sequences X be generated by an iid source P, and let the type of y be q, 
the optimal reproduction type for P and the distortion metric D. Then 

P(X ~ y) > ^ L^P(^B{y,d)), 

n 

where, X ~ y is as defined in text preceding this Lemma. 

Proof We adapt a so-called Cycle Lemma in Dvoretzky and Motzkin [38] that has been rediscovered 
several times [39] in literature. 

Consider sequences yo and yi corresponding to the zeros and ones of y. We first look for sequences 
xq and xi satisfying d(xo,yo) < D and d(xi,yi) < D, and make a sequence x by replacing the zeros 
of y with Xq and the ones of y with xi. Let B be the set of all such sequences x. 

Suppose (xq) and (xi) are cyclic shifts of some valid xq and xi respectively. Then the cycle lemma 
of [H5] states that at least (1 — D/2) fraction of these cyclic shifts are ~ yo and ~ yi respectively — we 
call them good shifts. Note that if we replace both yo and yi with good shifts of xo and xi to obtain 
a sequence x, then it follows that x ~ y. In addition, all sequences formed by replacing the zeros and 
ones with (good or otherwise) shifts of xo and xi have the same type, and hence the same probability 
under P. Thus 

P(X ~ y) > (1 - D/2fP{B). 

Furthermore, it is easy to verify that if the type of y is the optimal reproduction type, (remove and 
use only previous equation — the next equation is unnecessary and never used) 

P[B)>-P{B{y,D)). □ 
n 

2 Codelet parsing 

At the core of the paper is the Codelet parsing algorithm for lossy compression with a Hamming 
distortion constraint. When no distortion is allowed, the algorithm reduces to the lossless Lempel 
Ziv algorithm. Codelet Parsing sequentially parses the source sequence into non-overlapping phrases, 
mapping each phrase to a codelet in a dictionary. The dictionary in turn is updated. 

At the block level, the codelet parsing algorithm maps a source sequence x" to a distorted sequence 
y", and then encodes and transmits the latter without loss using a LZ78 encoder. We describe the 
algorithm with an example, full details are available in |37j . 

Example 1. Consider the string xY' = 0110101101000, which we will encode with allowable ham- 
ming distortion D < 1/2. We initialize a codebook Co = {0, 1}, call the members of the codebook as 
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codelets, and denote the type of a string v by t{v). At each step, we choose a codelet to represent a 
portion of the unparsed string, such that the codelet is within distortion 1/2 from a matching length 
prefix of the unparsed string. 

At step t = 1, the unparsed string is 0110101101000. The codelet has a prefix (0) within 
distortion 0, while the codelet 1 does not match any prefix to within distortion 1/2. The first bit 
of is represented by the codelet 0, and the matching codelet in Co is replaced by its one bit 
extensions, namely 00 and 01, to yield Ci. 

Now Ci = {00,01, 1}, and the unparsed segment of the string is 110101101000. Note that codelet 
1 has a prefix (1) within distortion while the codelet 01 has a prefix (11) within distortion 1/2. We 
have two choices: represent the first bit of the unparsed segment with the codelet 1, or the first two 
bits of the unparsed segment with 01. 

To decide, we build the set of matching codelets Aii = {01,1}. To each codelet m £ Mi, 
associate the prefix r of that will be parsed thus far if m is chosen, and compute the metric 
Im{T{m),T{r), D). Therefore for m = 01, the prefix r of associated is Oil (0 from the first round, 
and 11 from this round). The metric for the codelet 01 is then /m(r(01), r(Oll), 1/2). Choose the 
codelet with the minimum metric, and update the codebook by replacing the chosen codelet with 
its one bit extensions. Suppose the chosen codelet is 01, C2 = {00,010,011,1}, and the bits 11 are 
represented by 01 in this round. The unparsed string for the next round is then 0101101000. □ 

As we saw in the second round above, there are usually multiple ways to parse the incoming source 
string and map it into codewords. Indeed the crux of the algorithm is the answer to: 

How do we select between multiple parsings? 

Interestingly, the most natural extension of Lempel Ziv algorithm to the lossy case — picking one of 
the longest codelet among the matches — is proven suboptimal in [20], in a specific LZ77 setting. 

3 Idealization of codelet parsing 

To understand the codelet parsing algorithm described above, we idealize the codelet parsing algorithm 
in order to isolate the core phenomena underlying the algorithm, and to make it amenable to a simple 
analysis. 

(remove, add universal section) For the sake of simplicity, and because we are only analyzing the 
iid case in this paper, we assume that the Idealized Codelet Parsing algorithm knows the underlying 
statistics of the data. Note that in the iid case, we learn the underlying statistics at the rate of 
0{i/^/s), where s is the length of the string we have observed thus far, and hence at an exponentially 
faster rate than we would expect for any LZ type algorithm. 

Modifications 

Known horizon First, we assume that the blocklength of the input string x is known in advance. 
Note that while this aids analysis, it is not a stringent restriction. In practice, a modification of 
the doubling trick ([50], Chapter 2.3) can be used to handle strings whose length is unknown, with 
asymptotically no degradation in performance. For details, please see [6]. 
Let y be a length-L sequence with the optimal reproduction type, and let 

p, = P(X ~ y). 
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where X is a sequence generated by P. Now let 

Further, denote an input sequence z of length £ to be e-typical if \h{p) + logP(z)| < £e, and let 
be the set of all ^—length e— typical sequences. 

Updating the dictionary The Idealized Codelet Parsing algorithm initializes the dictionary with 
all 2^ ^-length sequences. Among them, it first obtains a set Vi of Mi codelets of length £. Then, every 
sequence in is replaced with all its 2^ £-hit extensions, and among them length-2£ codelets are 
chosen to obtain V2e. The algorithm proceeds by then updating the dictionary with longer codelets, 
forming in turn, the sets V^^ for increasing values of k. 

Selecting codelets by partial matching To pick any codelet to represent a portion of the un- 
parsed, input sequence, the algorithm finds the longest matching codelet from the leaves of the dic- 
tionary tree. 

Note that we can exploit because we map any codelet y to only sequences x such that x ~ y, 
finding the longest match does not require exhaustive search among the codelets with high probability. 
Following is an algorithm that does the search among L-lcngth codelets in 0{2^L'^) operations with 
high probability. 

Let X = xi, X2 . . . be the unparsed segment of the input. 

Zi = {YeVr.y^ x[}. 

be the partial matches at level £. Among all the descendents of in X>2^) find all partial matches for 
to obtain ^2^. The crucial point to observe is 

Property 1. If there exists G such that ~ x"^, then x\, namely G Z^. □ 

Therefore Z2t contains all sequences in P2^ that We would not have this property if we simply 

obtained the sets Z by picking codelets that satisfied the distortion constraint alone. Combined with 
the Lemma ?? below that with high probability, \Zki\ grows polynomially rather than exponentially, 
obtaining Z^ for any L can be done polynomially in L. In the low probability event that Z^i^ grows 
faster than the Lemma bound, we simply give up. 

Lemma 2. For all (5, with probability > 1 — (5, simultaneously for all k 

\Zkl\ < □ 

4 Optimality of Codelet Parsing 

We show that the Idealized Codelet Parsing algorithm is optimal. Let X" = Xi , . . . be generated 

by a binary memoryless source P, with P{Xi = \) = p. Let the target average Hamming distortion 
constraint be D. Let be the distorted representation of X" output by the algorithm, and let >C(y*) 
be the number of bits required to describe F". Then, 

Theorem 3. For the Idealized Codelet Parsing algorithm, 

n \ logn J 

and id(X",y") < D □ 
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The expectation above is taken over all the choices made by the algorithm and over the input 
sequences. 

Analysis of the cover 

We first establish that the codelets provide a good cover for the source phrases. 

The algorithm chooses codelets of lengths ^, 21 and so on. We will often refer to the length of 
codelets as their depth, since they are either internal nodes or leaves of the dictionary tree. Let 
be the i'th (in sequence) codelet chosen at depth L of the dictionary tree. Note that the dictionary is 
itself random (dictated by X and the random choices made while populating it), and we denote by Px, 
the dictionary at depth L once the algorithm has processed a length n sequence. For any sequence X 
with length L, let 7^(X) be the number of codelets in Dl that are within the distortion budget from 
X. We will drop the argument of 7^ when writing expectations for simplicity. All expectations that 
follow are over X and T>. 

As mentioned before, too many matches is a sign of suboptimality. To quantify this, we compute 
E7^ and ET^. Together, they provide a lower bound on the probability 7^ > 0, namely the probability 
that X is covered by some element of the dictionary at depth L. 

Clearly E7^ is easy to compute for any L by linearity of expectation. However ET^ is somewhat 
trickier to bound, but is well behaved. We show in Lemma [8] that when averaged over all possible 
codebooks, ET^ is lower than the corresponding expectation if we chose codelets at random. 
From [], random choice of codelets leads to good covers with overwhelming probability. We will 
therefore conclude that, the cover gets better as we parse longer. Computation of ETJ^ is somewhat 
involved, but the algebra is simplified for a Bernoulli 1/2 source. 

We first note that the codebook construction contains symmetries that we will need to exploit for 
Lemma [8l 

Lemma 4. Let = {^^). For all y £T^, 

M 

P(y eVL) = -^ 

' L 

Proof Suppose the length ofyheL = ki and let y' = y'{, y'Y+i^ ■ ■ ■ jy'fk-i)e+i- Note that each y'^l^^^^^ 
can be obtained from the corresponding subsequence Z/j^^iJiJ^^ by some permutation of bit locations of 
the later, since both bit sequences have the same type. Represent these permutations by (Tq, . . . ,crk-i, 
and we write y'{ = cro{y{) as a shorthand. These permutations are not unique, however we will fix 
one valid value for each of ctq, . . . ,cTfc_i. 

Let XC{y) be the set of length n input sequences and the corresponding choices between multiple 
matches made by the algorithm that induce y € T>l. Corresponding to each input sequence x that 
could induce y, we represent the choices as numbers, one for each phrase, indicating (in lexicographic 
order) which of the codelets that ~ x are chosen. Thus, 

XC{y) = {(x, c) : choices c on sequence x induce y }. 

Similarly for XC{y'). 

To see that there is a bijection between XC{y') and XC{y), take an element (x, c) G ^{y)- 
Prom X, we obtain x' G XC{y') by manipulating each phrase obtained in the parsing of x. Suppose 
z = . . . z^_^-^^_^^ is a phrase obtained during the parsing of x. li m <k we replace z with 

Z' = (7o(zf)...f7^_l(z|^^_l)^) 
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and if m > A; we replace z with 

z — (jQyz^) . . .ak-i\,Z(^^_^Y)z^i^^^ • • • 

Now to make choices among competing matches, instead of lexicographic ordering, we use the lexi- 
cographic ordering under H (replace W with concatenation symbol). Now, note that if 
(x, c) yielded y, (x',c) will yield y' . Finally, since iid probabilities of sequences do not change when 
their bit locations are permuted, it follows that 

P(y G Vl) = nXC{y)) = = Hv' G ^l). □ 

Lemma 5. Let yi,2/2 G be identical in the first r length segments. Then, 

P(yi andys < ^V7^^— . □ 

The next Lemma would easily follows from the linearity of expectation, but we provide a slightly 
more convoluted proof using the above Lemma HI Let Nl_j:>{^) be the number of codelets that match 
X in the randomly chosen codebook T>. For the codelet parsing algorithm described above. 

Lemma 6. ENl,v = M^p^. 
Proof Note that 

^Nl,v = J2 ^(^) E ^ ^'^d y ~ x) = ^ P(y G Vl) J2 ^(^) 

^ y y xGB(y,D) 

(a) ^ M 



Y^-^^{B{y,d)) = M,p,. 



y 



where (a) follows from Lemma HI □ 

Lemma 7. Let yi and yL be two sequences with type q. Let y^ and y^ be two sequences with type 
q and length Then, 

P{B{yLyi, d) n BiYLYi, d)) < P{B{yL, d) n B{yL,d)) f ^) . □ 



Lemma 8. Let Nl^x>0^) be the number of codelets of length L that match X in codebook Dl, and 
let Nl^x>0^) be the number of codelets that match X in a codebook D. Then 



where = F{B{y,d)) for any y G T^. □ 

For comparison let us consider the expected value of KN^ ^ for random codebook constructions 
of length L. Here we use codebooks Cl populated with sequences of type q as follows. Generate 
independent sequences of length L, with the L-length sequence generated in step (i) being X^*^ Each 
X*-*) is in turn obtained by generating L bits iid Bernoulli (1/2). Initialize cj|'^ = <j). At every step 
i, update C*^*^ = cj* ^'^ U {y}, where y is a randomly chosen length L sequence of type q such that 

X^*) G B{y,D). Stop after the i = M^th codelet is chosen, and let Cl = C^l ^^\ For such a random 
codebook construction, it is easy to see that 

The above Lemmas imply 



10 



Corollary 9. P{N,^,,^ > 0) > p(^,.;L'^^+"i;(t,pj 

Proof Cauchy Schwartz Inequality. □ 

The next cog m the proof is the observation that there cannot be too many "short" phrases in the 
lossy representation. 

Lemma 10. For n sufficiently large, the number of nodes in the dictionary with length shorter than 

logn-ie ■ ^ n (-, 
R(d) - (logn)^- ^ 

The details of the reminder of the proof is omitted, but follows the following line of arguments 
standard in LZ analysis literature. (Complete below) 

The section populated by short phrases contributes at most redundancy l/(logn). Unrolling 
Corollary [9l with high probability we find that some element of the dictionary matches an incoming 
phrase. Describing such phrases takes at most logn bits, and such phrases by Lemma [TO] have length 
> logn — 71og log yielding a per symbol encoding rate of i?(D) + 0(/racloglog?T,logn). With 
a small probability, no element of the dictionary matches an incoming phrase — forcing us to describe 
such phrases bit for bit, adding another ©(/racloglognlogn) to the coding rate. 

(Complete above) 
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